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Abstract: We analyze the (unconditional) distribution of a linear predictor 
that is constructed after a data-driven model selection step in a linear regres- 
sion model. First, we derive the exact finite-sample cumulative distribution 
function (cdf) of the linear predictor, and a simple approximation to this (com- 
plicated) cdf. We then analyze the large-sample limit behavior of these cdfs, 
in the fixed-parameter case and under local alternatives. 

1. Introduction 

The analysis of the unconditional distribution of linear predictors after model selec- 
tion given in this paper complements and completes the results of Leeb jlj , where 
the corresponding conditional distribution is considered, conditional on the out- 
come of the model selection step. The present paper builds on Leeb [l[ as far as 
finite-sample results are concerned. For a large-sample analysis, however, we can 
not rely on that paper; the limit behavior of the unconditional cdf differs from that 
of the conditional cdfs so that a separate analysis is necessary. For a review of the 
relevant related literature and for an outline of applications of our results, we refer 
the reader to Leeb [l[ . 

We consider a linear regression model Y = X9 + u with normal errors. (The 
normal linear model facilitates a detailed finite-sample analysis. Also note that 
asymptotic properties of the Gaussian location model can be generalized to a much 
larger model class including nonlinear models and models for dependent data, as 
long as appropriate standard regularity conditions guaranteeing asymptotic normal- 
ity of the maximum likelihood estimator are satisfied.) We consider model selection 
by a sequence of 'general-to-specific' hypothesis tests; that is, starting from the 
overall model, a sequence of tests is used to simplify the model. The cdf of a linear 
function of the post-model-selection estimator (properly scaled and centered) is de- 
noted by Gn,e,a{t)- The notation suggests that this cdf depends on the sample size 
n, the regression parameter 9, and on the error variance . An explicit formula 
for Gn,e,a{t) is given in (I3.10p below. From this formula, we see that the distribu- 
tion of, say, a linear predictor after model selection is significantly different from 
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(and more complex than) the Gaussian distribution that one would get without 
model selection. Because the cdf Gn.e,ij{t) is quite difficult to analyze directly, we 
also provide a uniform asymptotic approximation to this cdf. This approximation, 
which we shall denote by G^g^{t), is obtained by considering an 'idealized' sce- 
nario where the error variance cr^ is treated as known and is used by the model 
selection procedure. The approximating cdf G* g ^ (t) is much simpler and allows 
us to observe the main effects of model selection. Moreover, this approximation 
allows us to study the large-sample limit behavior of Gn,e,a{t) not only in the 
fixed-parameter case but also along sequences of parameters. The consideration of 
asymptotics along sequences of parameters is necessitated by a complication that 
seems to be inherent to post-model-selection estimators: Convergence of the finite- 
sample distributions to the large-sample limit distribution is non-uniform in the 
underlying parameters. (See Corollary 5.5 in Leeb and Potschcr Appendix B in 
Leeb and Potscher ^].) For applications like the computation of large-sample limit 
minimal coverage probabilities, it therefore appears to be necessary to study the 
limit behavior of Gn.e.a{t) along sequences of parameters 0^"^ and cr*^"). We char- 
acterize all accumulation points of G„ g(n) (t) for such sequences (with respect 
to weak convergence). Ex post, it turns out that, as far as possible accumulation 
points are concerned, it suffices to consider only a particular class of parameter 
sequences, namely local alternatives. Of course, the large-sample limit behavior of 
Gn.d,a{t) in the fixed-parameter case is contained in this analysis. Besides, we also 
consider the model selection probabilities, i.e., the probabilities of selecting each 
candidate model under consideration, in the finite-sample and in the large-sample 
limit case. 

The remainder of the paper is organized as follows: In Section [21 we describe 
the basic framework of our analysis and the quantities of interest: The post-model- 
selection estimator 6 and the cdf Gn.e.ait)- Besides, we also introduce the 'idealized 
post-model-selection estimator' 9* and the cdf G* g ^{t), which correspond to the 
case where the error variance is known. In Section Hi we derive finite-sample ex- 
pansions of the aforementioned cdfs, and we discuss and illustrate the effects of the 
model selection step in finite samples. Section [J] contains an approximation result 
which shows that G„^g^a-{t) and G* g ^{t) are asymptotically uniformly close to each 
other. With this, we can analyze the large-sample limit behavior of the two cdfs in 
Section m All proofs are relegated to the appendices. 

2. The model and estimators 

Consider the linear regression model 

(2.1) Y = X9 + u, 

where X is a non-stochastic n x P matrix with rank(X) — P and u ^ N{0, cr^/„), 
(T^ > 0. Here n denotes the sample size and we assume n > P > 1. In addi- 
tion, we assume that Q — lunn^oo X'X/n exists and is non-singular (this as- 
sumption is not needed in its full strength for all of the asymptotic results; cf. 
Remark l2.ip . Similarly as in Potscher 0], we consider model selection from a col- 
lection of nested models Mq C Afo+i C • • • C Mp which are given by Mp = 
{{01, . . . , Op)' e : Op+i = • • • = 6'p = 0} (0 < p < P). Hence, the model Mp 
corresponds to the situation where only the first p regressors in (12. ip are included. 
For the most parsimonious model under consideration, i.e., for Mq, we assume that 
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O satisfies < O < P; if O > 0, this model contains those components of the pa- 
rameter that wiU not be subject to model selection. Note that Mq = {(0, . . . , 0)'} 
and Mp = R.^ . We call Mp the regression model of order p. 

The following notation will prove useful. For matrices B and C of the same 
row-dimension, the column- wise concatenation of B and C is denoted by (B : C). 
If is an TO X P matrix, let D[p] denote the matrix of the first p columns of D. 
Similarly, let -D[^p] denote the matrix of the last P — p columns of D. If x is a 
P X 1 (column-) vector, we write in abuse of notation x[p] and x[-^p] for [x'[p\)' and 
{x'[-^p\)\ respectively. (We shall use these definitions also in the 'boundary' cases 
p = Q and p = P. It will always be clear from the context how expressions like 
D[0], Z?[^P], x[0], or x[-^P] are to be interpreted.) As usual the i-th component of 
a vector x will be denoted by x.^; in a similar fashion, denote the entry in the i-th 
row and j'-th column of a matrix B by Bi j. 

The restricted least-squares estimator for 9 under the restriction 9[-^p] = will 
be denoted by 9{p), < p < P (in case p = P, the restriction is void, of course). 
Note that 9{p) is given by the P x 1 vector whose first p components are given 
by {X[p]'X[p^)~^X[p]'Y, and whose last P — p components are equal to zero; the 
expressions 9(0) and 9{P), respectively, are to be interpreted as the zero- vector 
in and as the unrestricted least-squares estimator for 9. Given a parameter 
vector 9 in K^, the order of 9, relative to the set of models Mp, . . . , Mp, is defined 
as po{9) — min{p : < p < P, 9 £ Mp}. Hence, if 9 is the true parameter vector, 
only models Mp of order p > po{9) are correct models, and Mp^^(^g) is the most 
parsimonious correct model for 9 among Mq, . . . , Mp. We stress that po{0) is a 
property of a single parameter, and hence needs to be distinguished from the notion 
of the order of the model Mp introduced earlier, which is a property of the set of 
parameters Mp. 

A model selection procedure in general is now nothing else than a data-driven 
(measurable) rule p that selects a value from {O, . . . , P} and thus selects a model 
from the list of candidate models Mq, . . . , Mp. In this paper, we shall consider a 
model selection procedure based on a sequence of 'general-to-specific' hypothesis 
tests, which is given as follows: The sequence of hypotheses : po{9) < p is tested 
against the alternatives iJf : pq{9) — p in decreasing order starting aX p — P. If, 
for some p > O, Hq is the first hypothesis in the process that is rejected, we set 
p = p. If no rejection occurs until even i?^^^ is accepted, we set p = O. Each 
hypothesis in this sequence is tested by a kind of i-test where the error variance is 
always estimated from the overall model. More formally, we have 



being the (non-negative) square root of the p-th diagonal element of the matrix 
indicated and with ct^ = (n - P)-i(F - X9{P)Y{Y - X9(P)) (cf. also Remark 6.2 
m Leeb h| concerning other variance estimators). The critical values Cp are in- 
dependent of sample size (cf., however. Remark 12. ip and satisfy < Cp < oo for 
O < p < P. We also set co = in order to restrict p to the range of candidate mod- 
els under consideration, i.e., to {O, O + 1, . . . , P}. Note that under the hypothesis 



p = max{p : \Tp\ > Cp, < p < P} , 



where the test-statistics are given by Tq = and by Tp — \/n9p(p) / {a^n,p) with 



(2.2) 




(0 < p < P) 
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Hq the statistic Tp is ^-distributed with n — P degrees of freedom for < p < P. 
The so defined model selection procedure p is conservative (or over-consistent): 
The probability of selecting an incorrect model, i.e., the probability of the event 
{p < Pq{6)}, converges to zero as the sample size increases; the probability of se- 
lecting a correct (but possibly over-parameterized) model, i.e., the probability of 
the event {p = p} for p satisfying max{po(^), O} < p < P, converges to a positive 
limit; cf. I|5.6p below. 

The post-model-selection estimator 6 is now defined as follows: On the event 
p — p, 9 is given by the restricted least-squares estimator 9{p), i.e., 

p 

(2.3) e = ^0>)i{p = p}. 

To study the distribution of a linear function of 0, let A be a non-stochastic kx P 
matrix of rank fc (1 < fc < P). Examples for A include the case where A equals a 
1 X P (row-) vector Xf ii the object of interest is the linear predictor Xf9, or the 
case where A = {Ig : 0), say, if the object of interest is an s x 1 subvector of 6. We 
shall consider the cdf 

(2.4) Gn,9At) = Pn.e.a [V^A{0 - 9) < t) {te R''). 

Here and in the following, Pn.e.<y{') denotes the probability measure corresponding 
to a sample of size n from (j2.ip under the true parameters 9 and a. For convenience 
we shall refer to (|2.4[) as the cdf of A9^ although (|2.4p is in fact the cdf of an affine 
transformation of A9. 

For theoretical reasons we shall also be interested in the idealized model selection 
procedure which assumes knowledge of cr^ and hence uses T* instead of Tp, where 
T* — \/n9p{p)/{(T£_nA, < p < P, and Tq — 0. The corresponding model selector 
is denoted by p* and the resulting idealized 'post-model-selection estimator' by 9* . 
Note that under the hypothesis the variable T* is standard normally distributed 
for < p < P. The corresponding cdf will be denoted by G* g „{t), i.e., 

(2.5) G*„^gAt) Pn,e,a (y^A(9* ~ 9) < t) {t e R'^). 
For convenience we shall also refer to (|2.5[) as the cdf of A9*. 

Remark 2.1. Some of the assumptions introduced above are made only to simplify 
the exposition and can hence easily be relaxed. This includes, in particular, the 
assumption that the critical values Cp used by the model selection procedure do not 
depend on sample size, and the assumption that the regressor matrix X is such that 
X'X/n converges to a positive definite limit Q as n ^ oo. For the finite-sample 
results in Section[3]below, these assumptions are clearly inconsequential. Moreover, 
for the large-sample limit results in Sections [3] and [5] below, these assumptions can 
be relaxed considerably. For the details, see Remark 6.1(i)-(iii) in Leeb [ij, which 
also applies, mutatis mutandis, to the results in the present paper. 

3. Finite-sample results 

Some further preliminaries are required before we can proceed. The expected value 
of the restricted least-squares estimator 9{p) will be denoted by r]n{p) and is given 
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by the P x 1 vector 

e[p] + {x[p]'x[p])-^x[pyx[^p]9[^p] 



(3.1) 77„(p) = 



(0,...,0)' 



with the conventions that 77„(0) = (0, . . . , 0)' G and rin{P) — 9. Furthermore, let 
^n,p{t), t E M'^', denote the cdf of y/nA{9{p) — rin{p)), i.e., ^n,p{t) is the cdf of a cen- 
tered Gaussian random vector with covariance matrix cr^A[p] (X[p]'X[p]/n)^^74[p]' 
in case p > 0, and the cdf of point-mass at zero in M'^ in case p = Q. li p > and 
if the matrix A[p\ has rank fc, then $n,p(^) has a density with respect to Lebesgue 
measure, and we shall denote this density by 4>n,p{t)- We note that ijnip) depends 
on 9 and that $n,p(0 depends on cr (in case p > 0), although these dependencies 
are not shown explicitly in the notation. 

For p > 0, the conditional distribution of ^/n9p{p) given y^A{9{p) — rin{p)) = z 
is a Gaussian distribution with mean ^/nrin.p{p) + hn^pZ and variance cr^Cn p: where 



(3.2) = C^^y {A[p]{X[p]'X[p\ln)-^A[p\')- . and 

(3-3) Cn,p = ^n,p ~ bn,pCl^\ 

In the displays above, Cn'^ stands for A[p]{X[p]' X[p\/n)~'^ep, with Cp denoting 
the p-th standard basis vector in R^*, and {A[p\{X\p\' X[p]/n)~^ A[p\')~ denotes a 
generalized inverse of the matrix indicated (cf. Note 3(v) in Section 8a. 2 of Rao 7]). 
Note that, in general, the quantity hn^pZ depends on the choice of generalized inverse 
in p.2p : however, for z in the column-space of A[p\, bn,pZ is invariant under the 
choice of inverse; cf. Lemma A. 2 in Leeb Since ^/nA{9{p) — r]n{p)) lies in the 
column-space of A[p], the conditional distribution of ^/n9p{p) given y/nA{9{p) — 
rjn{p)) = 2; is thus well-defined by the above. Observe that the vector of covariances 
between A9{p) and 9p{p) is given by a'^n~^clt' . In particular, note that A9{p) and 
9p(p) are uncorrelated if and only if C,f^p — S,np (or, equivalently, if and only if 
bn,pZ = for all z in the column-space of ^[p]); again, see Lemma A. 2 in Leeb pj. 

Finally, for M denoting a univariate Gaussian random variable with zero mean 
and variance > 0, we abbreviate the probability P(|M — a\ < b) by As(a, 6), 
a e M U {— oo, oo}, 6 e M. Note that As(-, •) is symmetric around zero in its first 
argument, and that As(— oo,6) = As(oo,6) = holds. In case s = 0, M is to be 
interpreted as being equal to zero, such that Ao(a, b) equals one if \a\ < h and zero 
otherwise; i.e., Ao(a,6) reduces to an indicator function. 



3.1. The known-variance case 

The cdf G* g ^{t) can be expanded as a weighted sum of conditional cdfs, condi- 
tional on the outcome of the model selection step, where the weights are given by 
the corresponding model selection probabilities. To this end, let G* g ^{t\p) denote 

the conditional cdf of y/nA{9* — 9) given that p* equals p for O < p < P; that 
is, G;g^(t|p) = Pn,e.a{VnA{9* - 9) < t\p* ^ p), with t e Moreover, let 
'^nduip) = Pn.e.aip* = p) dcuotc the corresponding model selection probability. 
Then the unconditional cdf G* g ^{t) can be written as 

p 

(3.4) G^^gjt) = J2 Gn,e,am<eAP)- 

p=0 
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Explicit finite-sample formulas for G* g a-{t\p), O < p < P, are given in Leeb 
equations (10) and (13). Let 7(61, g,s) = A^^^ ^{^/rir]n,qiq), sCqa^n,q), and 7*(Cn,g, 
z, s) = A<j^„_^ {y^rin,q{q) + bn,qZ, sCqaCn,q) It is elementary to verify that tt* g_^(0) 
is given by 

p 

(3.5) n 7(C«,„1) 

g=0+l 

while, for p > O, we have 

p 

(3.6) <,^,(p) = (l-7(6.,p,l))x n 7(6.,g,l). 

q=p+l 

(This follows by arguing as in the discussion leading up to (12) of Leeb [H, and by 
using Proposition 3.1 of that paper.) Observe that the model selection probability 
'^n a (p) always positive for each p, O < p < P. 

Plugging the formulas for the conditional cdfs obtained in Leeb [l| and the above 
formulas for the model selection probabilities into p.4p , we obtain that G* g ^ (t) is 
given by 

p 

9=0+1 

P r 

(3.7) + / (l-7*(C«,p,^,l))$n.p(d2) 

p 

X n t(^«.9'1)- 

q=p+l 

In the above display, ^n,p{dz) denotes integration with respect to the measure 
induced by the cdf ^n,p{t) on R*^. 



3.2. The unknown-variance case 



Similar to the known-variance case, define Gn,e,<Tit\p) = Pn,e,i7{\/nA{9 — 9) < 
t\p = p) and iTnfi,a{p) = Pn,e,a{p = p) , O < p < P . Then Gn,e,a{t) can be ex- 
panded as the sum of the terms Gn,e,a-{t\p)'^n.9.a{p) for p = O,. . .,P, similar to 

(IMI). 

For the model selection probabilities, we argue as in Section 3.2 of Leeb and 
Potscher (sj to obtain that T^n.e.aiO) equals 

/•oc 

(3.8) nn,e,a{0)^ Y[ j{(n,q, s)h{s)ds, 

9=0+1 

where h denotes the density of a /a, i.e., h is the density of (n — P)^^/^ times 
the square-root of a chi-square distributed random variable with n — P degrees of 
freedom. In a similar fashion, for p > O, we get 



(3.9) 



T^n,eAP)^l (1-7(C«,P,S)) n 7(Cn,g,s)/l(s)ds; 
•^0 g=P+l 
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cf. the argument leading up to (18) in Leeb l]. As in the known- variance case, the 
model selection probabilities are all positive. 

Using the formulas for the conditional cdfs Gn,e,a{t\p), O < p < P, given in Leeb 
[H, equations (14) and (16)-(18), the unconditional cdf Gnfi.ait) is thus seen to be 
given by 

/•oo P 

GnfiAt) = <^nMt - V^MVniO) - 9)) J] 1 i^n, g, s)h{s)ds 

"'0 q=0+l 



P 



(3.10) + E / / (l-7*(Cn,p,^,s)) 





p 

X W -/{^n,q,s)h{s)ds $„,p(dz). 

q=p+l 



Observe that Gn,e,a-{t) is in fact a smoothed version of G^g^{t): Indeed, the 
right-hand side of the formula (|3.10p for Gn,s,a{t) is obtained by taking the right- 
hand side of formula (|3.7p for G* g ^{t), changing the last argument of 7(^11^5, 1) 
and 7*(Cn^g,z, 1) from 1 to s for q = O + 1, . . . , P, integrating with respect to 
h{s)ds, and interchanging the order of integration. Similar considerations apply, 
mutatis mutandis, to the model selection probabilities T^n,e,cj{p) and g „{p) for 
O <p<P. 

3.3. Discussion 

3.3.1. General Observations 

The cdfs G* g ^{t) and Gn.e.cr{t) need not have densities with respect to Lebesgue 
measure on R*^. However, densities do exist if O > and the matrix A[0] has rank 
k. In that case, the density of Gn,e,a(t) is given by 

/•oo P 

KAt - V^A{r)^{0) -9)) Y[ j{^n,q, s)h{s)ds 

"'O q=0 + l 

P nOO 

(3.11) + E [/ (l-7*(Cn,p,i- V^A(r7„(p)-0),s)) 

p=o+i -^o 

p 

X li^n,q, s)h{s)ds 0„,p(t - ^/nA{r]n{p) - 9)). 

q=p+l 

(Given that O > and that A[0] has rank k, we see that A[p] has rank k and 
that the Lebesgue density (j)n,p{t) of <&„,p(i) exists for eachp = O, . . . , P. We hence 
may write the integrals with respect to ^n,p{dz) in (|3.10|) as integrals with respect 
to (j)n,p{z)dz. Differentiating the resulting formula for Gn.e,a{t) with respect to t, 
we get p.lip .) Similarly, the Lebesgue density of G'^g ,j{t) can be obtained by 
differentiating the right-hand side of (|3.7p . provided that O > and A[0] has rank 
k. Conversely, if that condition is violated, then some of the conditional cdfs are 
degenerate and Lebesgue densities do not exist. (Note that on the event p = p, A9 
equals A9{p), and recall that the last P—p coordinates of 9{p) are constant equal to 
zero. Therefore A9{0) is the zero- vector in M.^ and, for p > 0, A9{p) is concentrated 
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in the column space of A[p\. On the event p* = p, a similar argument applies to 

Ae*.) 

Both cdfs G* and Gn,e,a{t) are given by a weighted sum of conditional 

cdfs, cf. (|3.7p and (|3.10p . where the weights are given by the model-selection prob- 
abilities (which are always positive in finite samples). For a detailed discussion of 
the conditional cdfs, the reader is referred to Section 3.3 of Leeb [if. 

The cdf Gnfi,iy{i) is typically highly non-Gaussian. A notable exception where 
Gn,e,a{t) reduces to the Gaussian cdf $,i,p(i) for each g occurs in the special 
case where 9p{p) is uncorrelated with AO{p) for each p = O + 1, . . . , P. In this case, 
we have A6{p) = AO{P) for each p — O, . . . ,P (cf. the discussion following (20) in 
Leeb [l|). From this and in view of (|2.3p . it immediately follows that Gn,e,<y{t) = 
'^n,p{t), independent of 9 and a. (The same considerations apply, mutatis mutandis, 
to G^g ^{t).) Clearly, this case is rather special, because it entails that fitting the 
overall model with P regressors gives the same estimator for A9 as fitting the 
restricted model with O regressors only. 

To compare the distribution of a linear function of the post-model-selection es- 
timator with the distribution of the post-model-selection estimator itself, note that 
the cdf of 6 can be studied in our framework by setting A equal to Ip (and k 
equal to P) . Obviously, the distribution of 9 does not have a density with respect 
to Lebesgue measure. Moreover, 9p{p) is always perfectly correlated with 9{p) for 
each p — 1, . . . , P, such that the special case discussed above can not occur (for A 
equal to Ip). 

3.3.2. An illustrative example 

We now exemplify the possible shapes of the finite-sample distributions in a simple 
setting. To this end, we set P = 2, C = 1, ^ = (1, 0), and A: = 1 for the rest of this 
section. The choice of P = 2 gives a special case of the model (j2.1[) , namely 

(3.12) Y, = 02^^,2 + {l<i<n). 

With = 1, the first regressor is always included in the model, and a pre-test will 
be employed to decide whether or not to include the second one. The two model 
selectors p and p* thus decide between two candidate models. Mi = {{91,02)' G 
M2 . = 0} and A/2 = {(6'i,6'2)' € K^}- The critical value for the test between 
All and M2, i.e., C2, will be chosen later (recall that we have set co = ci — 0). 
With our choice of A — (1, 0), we see that Gn,9,a{t) and G* q ^{t) are the cdfs of 
^/n{9i — 9i) and \/n{9l — 9i), respectively. 

Since the matrix A[0] has rank one and k = 1, the cdfs of ^/n{9i — 9i) and 
\/n{9l — ^i) both have Lebesgue densities. To obtain a convenient expression for 
these densities, we write {X' X / n)~^ , i.e., the covariance matrix of the least- 
squares estimator based on the overall model p.l2p . as 



(^) 


-1 

=( 




0-1,2 \ 








-I ) 



The elements of this matrix depend on sample size n, but we shall suppress 
this dependence in the notation. It will prove useful to define p = cri,2/(CTi(72), 
i.e., p is the correlation coefficient between the least-squares estimators for 0i and 
62 in model (|3.12p . Note that here we have (t>n,2{t) = <^i^4'{t/'^i) and (j)n,i{t) = 



Linear prediction after model selection 



299 



(Ji^{l — p^) — p^) ^/^/(7i) with (f>(t) denoting the univariate standard 

Gaussian density. The density of \/n{9i — 9i) is given by 



4>n,lit + V^02pcri/(T2) J Ai{y/n92/cr2,sc2)h{s)ds 
'^^^^^ , ^ m f°°^^ A fVn02/(T2+ pt/ai SC2 . , 

+ 0n,2(O / ^ , ))h{s)ds; 

Jo v^-p v^-p 

recall that Ai(a, 6) is equal to $(a + 6) — $(a — 6), where denotes the standard 
univariate Gaussian cdf, and note that here h{s) denotes the density of (n — 2)~^/^ 
times the square-root of a chi-square distributed random variable with n — 2 degrees 
of freedom. Similarly, the density of \/n{6l — 9i) is given by 

(l>n,l{t + Vn02pcri/(T2)Ai{y/n92/(72,C2) 

+ 0n,2(t)(l - Ai( , , —7==)). 

V 1 - P V 1 - P 

Note that both densities depend on the regression parameter 6*2)' only through 
62, and that these densities depend on the error variance and on the regressor 
matrix X only through cri , 172 , and p. Also note that the expressions in p.l3p and 
p.l4p are unchanged if p is replaced by —p and, at the same time, the argument t 
is replaced by —t. Similarly, replacing 62 and t by —62 and —t, respectively, leaves 
p.l3|) and (|3.14|) unchanged. The same applies also to the conditional densities 
considered below; cf. (|3.15|) and ()3.16p . We therefore consider only non-negative 
values of p and 62 in the numerical examples below. 

From (|3.14p we can also read-off the conditional densities of \/n{6\ ~ Oi), con- 
ditional on selecting the model Mp for p = 1 and p = 2, which will be useful 
later: The unconditional cdf of \/n{6l — 9i) is the weighted sum of two conditional 
cdfs, conditional on selecting the model Mi and M2, respectively, weighted by the 
corresponding model selection probabilities; cf. (|3.4p and the attending discussion. 
Hence, the unconditional density is the sum of the conditional densities multiplied 
by the corresponding model selection probabilities. In the simple setting considered 
here, the probability of p* selecting Mi, i.e., tt* g^(l), equals Ai{y/n92/(J2, C2) in 
view of (|3.5p and because = 1, and tt* ^, ,^(2) = 1 — tt* ^^^(1). Thus, conditional 
on selecting the model Mi, the density of ~ ^1) is given by 

(3.15) 4>n,i{t + \/n92p(Ji/cr2)- 

Conditional on selecting M2, the density of -\/n(^i ~ ^1) equals 



l^if,. , u^ 1 - Ai((V7^02M + pt/r7i)/yT~^, C2/yT~^) 

1 - Ai(Vnc'2/cr2,C2) 

This can be viewed as a 'deformed' version of (j>n,2{t), i.e., the density of -/n{9i (2) — 
9i), where the deformation is governed by the fraction in (|3.16p . The conditional 
densities of y/n{9i — 9i) can be obtained and interpreted in a similar fashion from 
p.l3p . upon observing that 7r„_e_cr(l) here equals /S.i[y/n92/ <J2, sc2)h{s)ds in 
view of ((^ 

Figure 1 illustrates some typical shapes of the densities of ^/n{9i — 9i) and 
\/n(9l — 9i) given in (|3.13p and (I3.14p . respectively, for p = 0.75, n — 7, and 
for various values of 6*2. Note that the densities of y/ri{9i — 9i) and v^(^i ^ ^1)1 
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Fig 1. T/ie densities of ^yn{6i — 61) (black solid line) and of y^(Sj — 61) (black dashed line) 
for the indicated values of 02, n = 7 , p = 0.75, and ai = a2 = 1. The critical value of the test 
between Mi and M2 was set to C2 = 2.015, corresponding to a t-test with significance level 0.9. 
For reference, the gray curves are Gaussian densities 4>n,i{t) (larger peak) and ipn,2{t) (smaller 
peak). 



corresponding to the unknown-variance case and the (ideahzed) known-variance 
case, are very close to each other. In fact, the smah sample size, i.e., n = 7, was 
chosen because for larger n these two densities are visually indistinguishable in 
plots as in Figure 1 (this phenomenon is analyzed in detail in the next section). For 
6*2 = in Figure 1, the density of \/n{9l — 9i), although seemingly close to being 
Gaussian, is in fact a mixture of a Gaussian density and a bimodal density; this is 
explained in detail below. For the remaining values of 62 considered in Figure 1, 
the density of \/n{9l — 61) is clearly non-Gaussian, namely skewed in case 62 = 0.1, 
bimodal in case 02 = 0.75, and highly non-symmetric in case 02 = 1-2. Overall, we 
see that the finite-sample density of \/n{0l — 0i) can exhibit a variety of different 
shapes. Exactly the same applies to the density of \/n{0i — 0i). As a point of 
interest, we note that these different shapes occur for values of 02 in a quite narrow 
range: For example, in the setting of Figure 1, the uniformly most powerful test of 
the hypothesis 02 = against 6*2 > with level 0.95, i.e., a one-sided t-test, has a 
power of only 0.27 at the alternative 02 = 1-2. This suggests that estimating the 
distribution of \/n{0i — 0i) is difficult here. (See also Leeb and Postcher ^] as well 
as Leeb and Postcher Q for a thorough analysis of this difficulty.) 

We stress that the phenomena shown in Figure 1 are not caused by the small 
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sample size, i.e., n = 7. This becomes clear upon inspection of (|3.13[) and (|3.14[) . 
which depend on 62 through \pnQ2 (for fixed cti, gi and p). Hence, for other values 
of n, one obtains plots essentially similar to Figure 1, provided that the range of 
values of 62 is adapted accordingly. 

We now show how the shape of the unconditional densities can be explained by 
the shapes of the conditional densities together with the model selection probabili- 
ties. Since the unknown- variance case and the known- variance case are very similar 
as seen above, we focus on the latter. In Figure 2 below, we give the conditional 
densities of \pri(d\ — 9i), conditional on selecting the model Mp, p — 1,2, cf. (|3.15p 
and (j3.16p . and the corresponding model selection probabilities in the same setting 
as in Figure 1. 

The unconditional densities of y^(0* — 9i) in each panel of Figure 1 are the sum 
of the two conditional densities in the corresponding panel in Figure 2, weighted by 
the model selection probabilities, i.e, tt* g ^(1) and tt* g ^(2). In other words, in each 
panel of Figure 2, the solid black curve gets the weight given in parentheses, and the 
dashed black curve gets one minus that weight. In case 62 = 0, the probability of 
selecting model Mi is very large, and the corresponding conditional density (solid 
curve) is the dominant factor in the unconditional density in Figure 1. For 62 — 0.1, 
the situation is similar if slightly less pronounced. In case 62 ~ 0.75, the solid and 



theta2 = (0.96) theta2 = 0.1 (0.95) 




~i I I I I ° I I I I r 

-4 -2 2 4 -4 -2 2 4 




Fig 2. The conditional density of y/n{0^ — 9i), conditional on selecting model Mi (black solid 
line), and conditional on selecting model M2 (black dashed line), for the same parameters as used 
for Figure 1. The number in parentheses in each panel header is the probability of selecting Mi, 
i.e., TT* g ^(1)- The gray curves are as in Figure 1. 
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the dashed curve in Figure 2 get approximately equal weight, i.e., 0.51 and 0.49, 
respectively, resulting in a bimodal unconditional density in Figure 1. Finally, in 
case 02 — 1-2, the weight of the solid curve is 0.12 while that of the dashed curve is 
0.88; the resulting unconditional density in Figure 1 is unimodal but has a 'hump' in 
the left tail. For a detailed discussion of the conditional distributions and densities 
themselves, we refer to Section 3.3 of Leeb [l|. 

Results similar to Figure 1 and Figure 2 can be obtained for any other sample 
size (by appropriate choice of 82 as noted above), and also for other choices of 
the critical value C2 that is used by the model selectors. Larger values of C2 result 
in model selectors that more strongly favor the smaller model Afi, and for which 
the phenomena observed above are more pronounced (see also Section 2.1 of Leeb 
and Potscher Q for results on the case where the critical value increases with 
sample size). Concerning the correlation coefficient p, we find that the shape of 
the conditional and of the unconditional densities is very strongly influenced by 
the magnitude of \p\, which we have chosen as p = 0.75 in figures 1 and 2 above. 
For larger values of \p\ we get similar but more pronounced phenomena. As \p\ 
gets smaller, however, these phenomena tend to be less pronounced. For example, 
if we plot the unconditional densities as in Figure 1 but with p = 0.25, we get 
four rather similar curves which altogether roughly resemble a Gaussian density 
except for some skewness. This is in line with the observation made in Section [3. 3. II 
that the unconditional distributions are Gaussian in the special case where 9p{p) is 
uncorrelated with A9{p) for each p = O + 1, . . . , P. In the simple setting considered 
here, we have, in particular, that the distribution of y/n{9i — 9i) is Gaussian in the 
special case where p — 0. 

4. An approximation result 

In Theorem l4. 21 below, we show that G* g ^{t) is close to Gn,e,a{t) in large samples, 
uniformly in the underlying parameters, where closeness is with respect to the 
total variation distance. (A similar result is provided in Leeb for the conditional 
cdfs under slightly stronger assumptions.) Theorem 14. 21 will be instrumental in the 
large-sample analysis in Section [5l because the large-sample behavior of G* g ^{t) is 
significantly easier to analyze. The total variation distance of two cdfs G and G* on 
R'^ will be denoted by \ \G-G*\\tv in the following. (Note that the relation \G{t) - 
G*{t)\ < \\G-G*\\tv always holds for each t eR^. Thus, if G and G* are close 
with respect to the total variation distance, then G{t) is close to G*{t), uniformly 
in t. We shall use the total variation distance also for distribution functions G and 
G* which are not necessarily normalized, i.e., in the case where G and G* are the 
distribution functions of finite measures with total mass possibly different from 
one.) 

Since the unconditional cdfs Gn,e,c!{i) and G* g ^{t) can be linearly expanded in 
terms of Gnfi,a{t\p)'Knfi,a{p) and G* g^^(t|p)7r* g^(p), respectively, a key step for 
the results in this section is the following lemma. 

Lemma 4.1. For each p, O < p < P, we have 

(4.1) sup \\Gn,e,ai-\p)7^n,0,aip) - g „ (• |pX g „ (p) | | 0. 

<T>0 

This lemma immediately leads to the following result. 
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Theorem 4.2. For the unconditional cdfs Gn.e.a{t) and G* g ^{t) we have 
(4.2) sup \\Gn,e,cr ~ Glg „\\^^ "-^ 0. 

cr>0 

Moreover, for each p satisfying O < p < P , the model selection probabilities 
T^nfi.aiv) and ■n*^ g^^{p) satisfy 

sup \T:n,0Ap)-K,eAP)\ 0- 

CT>0 

By Theorem 14.21 we have, m particular, that 

sup sup \Gnfi,ait) - G*^^gAt)\ 0; 

cr>0 

that is, the cdf Gn.e.ait) is closely approximated by G* if n is sufBciently 

large, uniformly in the argument t and uniformly in the parameters 9 and a. The 
result in Theorem l4.2l does not depend on the scaling factor ^/n and on the centering 
constant A9 that are used in the definitions of Gn,e,a{t) and G* g „{t), cf. (|2.4p and 
(|2.5p . respectively. In fact, that result continues to hold for arbitrary measurable 
transformations of 6 and 9*. (See Corollarv lA.il below for a precise formulation.) 

Leeb [l| gives a result paralleling (|4.2p for the conditional distributions of A9 and 
A9*, conditional on the outcome of the model selection step. That result establishes 
closeness of the corresponding conditional cdfs uniformly not over the whole pa- 
rameter space but over a slightly restricted set of parameters; cf. Theorem 4.1 in 
Leeb [ll|. This restriction arose from the need to control the behavior of ratios of 
probabilities which vanish asymptotically. (Indeed, the probability of selecting the 
model of order p converges to zero as n — > oo if the selected model is incorrect; 
cf. (|5.6p below.) In the unconditional case considered in Theorem 14.21 above, this 
difhculty does not arise, allowing us to avoid this restriction. 

5. Asymptotic results for the unconditional distributions and for the 
selection probabilities 

We now analyze the large-sample limit behavior of Gn,e,<y{t) and G'^g^(t), both 
in the fixed parameter case where and a are kept fixed while n goes to infinity, 
and along sequences of parameters 0^"^ and cr*^"^. The main result in this section is 
Proposition 15 . II below. Inter alia, this result gives a complete characterization of all 
accumulation points of the unconditional cdfs (with respect to weak convergence) 
along sequences of parameters; cf. Remark [531 Our analysis also includes the model 
selection probabilities, as well as the case of local-alternative and fixed-parameter 
asymptotics. 

The following conventions will be employed throughout this section; For p satis- 
fying < p < P, partition Q — lim„^oo X'X/n as 

Q[P-P] Q[P-^P] \ 
^ \ Qbp--P] QbP-^p] y ' 

where Q[p : p] is a p x p matrix. Let $oo,p(i) be the cdf of a fc-variate centered 
Gaussian random vector with covariance matrix a'^ A[p]Q[p : p]^^ A[p]' , < p < P, 
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and let ^oo,a{t) denote the cdf of point-mass at zero in R*^. Note that ^oo.p{t) has 
a density with respect to Lebesgue measure on R*^ if p > and the matrix A[p] has 
rank fc; in this case, we denote the Lebesgue density of $oo,p(0 by (j)oo,p{t)- FinaUy, 
for p = 1, . . . , P, define the quantities 

cL,p = iQ[p ■■ pr^)p,p, 

h^,p = c^zHA[p]Q[p--p]''A[p]'r, 

where — A[p\Q[p : p]~^ep, with Cp denoting the p-th standard basis vector 
in RP. As the notation suggests, $oo,p(0 is the large-sample limit of $„.p(t), c!^\ 

Coo,p and Coo,p are the limits of ^n,p and Cn,p, respectively, and 6„,pZ bao,pZ 
for each z in the column-space of A[p\; cf. Lemma A. 2 in Leeb [H. With these con- 
ventions, we can characterize the large-sample limit behavior of the unconditional 
cdfs along sequences of parameters. 

Proposition 5.1. Consider sequences of parameters 0^"^ G R^ and a^"^^ > 0, such 
that y^^'"-* converges to a limit ip G (RU{— oo, oo})^, and such that cr'-"-' converges 
to a (finite) limit cr > as n ^ oo. Let p^ denote the largest index p, O < p < P, 
for which \ipp\ = oo, and set p<, = O if no such index exists. Then G* ^(„) {t) 
and G„ g{n.) „{n) (t) both converge weakly to a limit cdf which is given by 

p 

q=p,+l 

P ^ 

(5.1) + V / (l-A,Coe,p(^^'^+^P + CpaC )) ^oo.pidz) 

P 

q=p+l 

where 

(5.2) <5(rf = Q[p:p]-'^Q[P---P\ ^^[^p]^ 

p* < p < P (with the convention that is the zero-vector in R^ and, if necessary, 
that (J^*^-* = — Note that S^p^^ is the limit of the bias of 9{p) scaled by ^/n, i.e., 
^(p) — linin^oo ^/n{rin{p) — ^^"•'), with r]n{p) given by jiS.l]) with 6'^"^ replacing 9; 
also note that S'^p^ is always finite, p* < p < P. 

The above statement continues to hold with convergence in total variation re- 
placing weak convergence in the case where > and the matrix A[p^] has rank 
k, and in the case where p^ < P and y^A[^p,]0("'' [^p,] is constant in n. 

Remark 5.2. Observe that the limit cdf in (|5.ip is of a similar form as the finite- 
sample cdf G* g ^ (t) as given in ()3.7p (the only difference being that the right-hand 
side of (13. 7|) is the sum of P — O -I- 1 terms while (|5.ip is the sum of P — + 1 
terms, that quantities depending on the regressor matrix through X'X/n in p.7p 
are replaced by their corresponding limits in (|5.ip . and that the bias and mean 
of y/n6{p) in (|3.7p are replaced by the appropriate large-sample limits in (|5.ip V 
Therefore, the discussion of the finite-sample cdf G* g ^ (t) given in Section 13.31 
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applies, mutatis mutandis, also to the limit cdf in (|5.1[) . In particular, the cdf in 
(|5.ip has a density with respect to Lebesgue measure on R'' if (and only if) > 
and A[p^] has rank k; in that case, this density can be obtained from ()5.ip by 
differentiation. Moreover, we stress that the limit cdf is typically non-Gaussian. 
A notable exception where (|5.ip reduces to the Gaussian cdf $oo.p(i) occurs in 
the special case where 9q{q) and A6{q) are asymptotically uncorrelated for each 
q = p, + l,...,P. 

Inspecting the proof of Proposition 15.11 we also obtain the large-sample limit 
behavior of the conditional cdfs weighted by the model selection probabilities, e.g., 
of G„ o.(„) (i|p)7r„ o-C") (p) (weak convergence of not necessarily normalized 
cdfs Hn to a not necessarily normalized cdf H on M*^ is defined as follows: Hn{t) 
converges to H{t) at each continuity point t of the limit cdf, and i/„(M'^), i.e., the 
total mass of Hn on R'^, converges to H{M.'^)). 

Corollary 5.3. Assume that the assumptions of Proposition 15.11 are met, and 
fix p with O < p < P. In case p = p,, G„ e(„) ,^(„) (t|p,)7r„ g(„) ,^(„) (p,) con- 
verges to the first term in (j5.ip in the sense of weak convergence. If p > , 
G„ g(n) {t\p)'Kn gin) (p) convcrgcs weakly to the term with index p in the sum 
in (|5.ip . Finally, if p <p*, G„_g(7i) (t|p)7r„ 5)(„) (p) converges to zero in total 
variation. The same applies to G*^ g^^s^ ^(„) (t|p)7r* g(„) ^(„)(p). Moreover, weak con- 
vergence can he strengthened to convergence in total variation in the case where 
p > and A[p] has rank k (in that case, the weighted conditional cdf also has a 
Lebesgue density), and in the case where p < P and ^/nA[^p]9^"''> [^p] is constant 
in n. 

Proposition 5.4. Under the assumptions of Proposition [STTl the large-sample limit 
behavior of the model selection probabilities 7r„ (p), O < p < P, is as follows: 

For each p satisfying p* < p < P, 7r„ (p) converges to 

p 

(5.3) (l-A,5^_j4f)+^p, cpa^oo.p)) n ^-«oo„(4'^+^,,c,<oo,,). 

q=p+l 

Forp^p^, 7r„ g(„) ,j(„) (p,) converges to 

p 

(5.4) n A,eo„,,(4«)-K^„c,afoo,9). 

g=p* + l 

For each p satisfying O < p < p^, T^n ei^) ij(">ip) converges to zero. The above 
statements continue to hold with tt* g(„) (p) replacing 7r„ g(n) ^{n){p). 

Remark 5.5. With Propositions l5.ll and l5.4l we obtain a complete characterization 
of all possible accumulation points of the unconditional cdfs (with respect to weak 
convergence) and of the model selection probabilities, along arbitrary sequences 
of parameters 0^"^ and cr'^"-', provided that cr(") is bounded away from zero and 
infinity: Let ^^"^ be any sequence in and let cr^"^ be a sequence satisfying 
cr* < CT^") < cr* with < cr, < cr* < oo. Since the set (R U {-00,00})-^ as well 
as the set [(t»,(t*] is compact, each subsequence contains a further subsequence 
for which the assumptions of Propositions 15.11 and 15.41 are satisfied. For example, 
each accumulation point of G^fiin) (with respect to weak convergence) is of 

the form (|5.ip . where here ip and a are accumulation points of y^^^") and cr^"-', 
respectively (and where p» and the quantities 5^^\ P* l£ P ^ P, are derived from 
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tp as in Proposition [STT]) . Of course, the same is true for G* g(„) ^(„)(i). The same 
considerations apply, mutatis mutandis, to the weighted conditional cdfs considered 
in Corollarv l5.3l 

To study, say, the large-sample limit minimal coverage probability of confidence 
sets for A9 centered at AO, a description of all possible accumulation points of 
G„ ^(r.) _g.(„) (i) with respect to weak convergence is useful; here 6*^"^ can be any se- 
quence in and cr^"-* can be any sequence bounded away from zero and infinity. In 
view of Remark 15 -Si we see that each individual accumulation point can be reached 
along a particular sequence of regression parameters fl^"), chosen such that the 6'(") 
are within an 0(1/y^) neighborhood of one of the models under consideration, 
say, Mp^ for some O < < P. In particular, in order to describe all possible ac- 
cumulation points of the unconditional cdf, it suffices to consider local alternatives 

to e. 

Corollary 5.6. Fix 9 G and consider local alternatives of the form 9 + j/y/n, 
where 7 g R^. Moreover, let cr'") be a sequence of positive real numbers converging 
to a (finite) limit cr > 0. Then Propositions 15.11 and 15.41 apply with + ^ I \pn 
replacing where here equals max{po(^), (^nd V'i^P*] equals j[^p*] (in 

case < P). In particular, G* g^^/^^^(„) (i) and G„_g+^/^^^(„) (t) converge in 
total variation to the cdf in (15. ip with p^ — max{po(^), C*}- 

In the case of fixed-parameter asymptotics, the large-sample limits of the model 
selection probabilities and of the unconditional cdfs take a particularly simple form. 
Fix 6 S and cr > 0. Clearly, y/n6 converges to a limit i/j, whose po{9)-th com- 
ponent is infinite if pa{9) > (because the po{9)-t]i component of 9 is non-zero 
in that case), and whose p-th component is zero for each p > po{9). Therefore, 
Propositions [53] and [53] apply with — max{po(^), C}, and either with p^ < P 
and '0[^P*] — {^1- ■ ■ jO)', or with = P. In particular, p<, = max{po(^'), C*} is the 
order of the smallest correct model for 9 among the candidate models Mq, . . . , Mp. 
We hence obtain that G* g ^{t) and Gn,e.a{t) converge in total variation to the cdf 

p 

^oo,p.(0 n ^'^?oo.,(0,C,CrCoo,q) 
q=p, + l 




(1 - AcrC^_p(6oo,pZ,CpCr^oo,p))^'oo,p(dz) 



P 

X n ^'^?oo,,(0,CqCrCoo,g), 
q=p+l 

and the large-sample limit of the model selection probabilities 'Kn,e,a(j)) and 
K,eAp) for O < p < P is given by 

p 

(1 - ^'T«oo.p(0,CpCr^oo,p)) Jl A^^^^(0,CqCrCoo,g) if P > P* , 
q=p+l 
P 

(5-6) Jl A^^^ JO,CqCr^oo,g) lip^p*, 

9=p, + l 

if p < p* 

with p^ = max{po(0), O}. 
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Remark 5.7. (i) In defining the cdf Gn.e,a{t), the estimator has been centered 
at 9 and scaled by ^/n; cf. (|2.4p . For the finite-sample results in Section [3l a dif- 
ferent choice of centering constant (or scaling factor) of course only amounts to a 
translation (or rescaling) of the distribution and is hence inconsequential. Also, the 
results in Section 3] do not depend on the centering constant and on the scaling 
factor, because the total variation distance of two cdfs is invariant under a shift or 
rescaling of the argument. More generally, Lemma |4. II and Theorem 14.21 extend to 
the distribution of arbitrary (measurable) functions of 9 and 9*; cf. Corollarv lA.il 
below. 

(ii) We are next concerned with the question to which extent the limiting results 
given in the current section are affected by the choice of the centering constant. Let 
dn,e,a denote a P x 1 vector which may depend on n, 9 and a. Then centering at 
dn,0,<T leads to 



The results obtained so far can now be used to describe the large-sample behavior 
of the cdf in (|5.7p . In particular, assuming that ^JnA{dnfi,a — 9) converges to a limit 
S M*^, it is easy to verify that the large-sample limit of the cdf in (|5.7p (in the 
sense of weak convergence) is given by the cdf in (|5.5p with t + replacing t. If 
^/nA{dn,e,a ~ 9) converges to a limit e (M U {— oo, cx)})''' with some component 
of v being either oo or — oo, then the limit of (|5.7p will be degenerate in the sense 
that at least one marginal distribution mass will have escaped to oo or — oo. In 
other words, if i is such that ji^ij — oo, then the i-th component of ^JnA{9 — dn,e,a) 
converges to —Vi in probability as n — > oo. The marginal of (15. 7p corresponding to 
the finite components of converges weakly to the corresponding marginal of ()5.5p 
with the appropriate components of t + replacing the appropriate components of 
t. This shows that, for an asymptotic analysis, any reasonable centering constant 
typically must be such that Adn^^a coincides with A9 up to terms of order 0{1/ ^/n). 
If \/nA{dn.e,a — 9) does not converge, accumulation points can be described by 
considering appropriate subsequences. The same considerations apply to the cdf 
G* g „{t), and also to asymptotics along sequences of parameters 0^"^ and cr^"). 
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Appendix A: Proofs for Section [4] 

Proof of Lemma \4.1\ Consider first the case where p > O. In that case, it is easy 
to see that Gn,e,a-{t\p)''^n,e,i7{p) does not depend on the critical values Cq for q < p 
which are used by the model selection procedure p (cf. formula p.9p above for 
'^n,e,(T{p) and the expression for G„_e,cr(i|p) given in (16)~-(18) of Leeb Q). As a 
consequence, we conclude ioi p > O that Gn,g,a{t\p)'n'n,9,a{p) follows the same for- 
mula irrespective of whether O = or O > 0. The same applies, mutatis mutandis, 
to e cri^\p)'^n 9 ai'^)' heuce may assume that O = in the following. 

In the special case where A is the px P matrix {Ip : 0) (which is to be interpreted 
as Ip in case p = P), (|4.ip follows from Lemma 5.1 of Leeb and Potscher [3]. 
(In that result the conditional cdfs are such that the estimators are centered at 
rin{p) instead of 9. However, this different centering constant does not affect the 
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total variation distance; cf. Lemma A. 5 in Leeb [11.) For the case of general A, 
write /i as shorthand for the conditional distribution of y/n{Ip : 0)(9 — 9) given 
p ^ p multiplied by iTnfi^aip), A** as shorthand for the conditional distribution of 
^Jn{Ip : 0) {0* — 6) given p* — p multiplied by tt* g „{p), and let vj/ denote the 
mapping z ^ {{A[p\z)' : {—^/nA[-^p]9[-^p\yy in case » < P and z ^ Az in case 
p = P. It is now easy to see that Lemma A. 5 of Leeb [1| applies, and (|4.ip follows. 

It remains to show that (|4.ip also holds with O replacing p. Having established 
()4.1|) for p > O, it also follows, for each p = O + 1, . . . , P, that 



(A.l) sup \'Kn,eM{p) - K,e,a{P^\ — ' ^' 

because the modulus in (A.l) is bounded by 

\\Gn,eA-\p)'^n,eAp) - G'*,e,o-(-bX,e,a(p)llTy- 

Since the model selection probabilities sum up to one, we have 7r„_e_£r(C) = 1 — 
X]^=c>+i and a similar expansion holds for 7r*g^(C'). By this and the 
triangle inequality, we see that (jA.ip also holds with O replacing p. Now (|4.ip with 
O replacing p follows immediately, because the conditional cdfs Gn,0,a{t\O) and 
Gl^eA^P^ are both equal to <i>n,o{t- ^/^A{r]n{0) - 9)), cf. (10) and (14) of Leeb 
[l[, which is of course bounded by one. □ 



Proof of Theorem \4-S\ Relation (j4.2|) follows from Lemma 14.11 by expanding 
g ^(t) as in ()3.4p . by expanding Gn,9.cr{t) in a similar fashion, and by applying 
the triangle inequality. The statement concerning the model selection probabilities 
has already been established in the course of the proof of Lemma 14711 cf. (jA.ip and 
the attending discussion. □ 

Corollary A.l. For each n, 9 and a, let \E'n,e,CT(') be a measurable function on M^. 
Moreover, let Rn.e.ai') denote the distribution of'^n,e,c!{0), o,nd let P* g ^(•) denote 
the distribution of '^n,e,a-{9*)- (That is, say, Rn,0,cri-) is the probability measure 
induced by '^n,9,a{9) under Pn,e,a{')-) We then have 

(A.2) sup \\Rn,eA-) - Rl,eA-)\\Tv 

<T>0 

Moreover, if Rn,e,a(,'\p) OL'f^d R^ g ^{■\p) denote the distributions of '^n,e,a{^) condi- 
tional on p = p and o/ 4'„^e,CT(^*) conditional on p* = p, respectively, then 

(A.3) sup \\Rn,0M{-\p)T^nfiM{p) - Rn.e.a{-\p}Ki,e,a{p)\\TV """^ 0" 



(T>0 



Proof. Observe that the total variation distance of two cdfs is unaffected by a 
change of scale or a shift of the argument. Using Theorem 14.21 with A = Ip, we 
hence obtain that (|A.2p holds if ^'n^e.o- is the identity map. From this, the general 
case follows immediately in view of Lemma A. 5 of Leeb [if. In a similar fashion, 
(|A.3p follows from Lemma 4.1. □ 
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Appendix B: Proofs for Section [5] 

Under the assumptions of Proposition 15.11 we make the foUowing prehminary ob- 
servation: For p>Pit, consider the scaled bias of 0{p)^ i.e., \/n{jin{p) — 0^"^), where 
rin{p) is defined as in (|3.ip with 6''-"-' replacing 9. It is easy to see that 

where the expression on the right-hand side is to be interpreted as y^^*-"^ and as 
the zero vector in in the cases p = and p = P, respectively. For p satisfying 
p* < p < P, note that [^p] converges to i/^l^p] by assumption, and that this 

hmit is finite by choice of p > It hence follows that y/n{rin{p) — 0'^"-') converges 
to the limit (5^^' given in (|5.2p . From this, we also see that y/nrjn p{p) converges to 
-I- Tpp, which is finite for each p > p*; for p = p*, this limit is infinite in case 
\tpp,\ — oo. Note that the case where the limit of v^^n.p. (P*) finite can only 
occur if p* = O. It will now be convenient to prove Proposition 15.41 first. 

Proof of Proposition \5.4\ In view of Theorem 14.21 it suffices to consider 
TT* g(„) ^(„)(p). This model selection probability can be expanded as in (|3.5l) - (|3.6p 

with and cr^") replacing 6 and a, respectively. Consider first the individual 
A-functions occurring in these formulas, i.e., 

(B.l) A,(„)^^_^(V^77„,,((7),c,a(")^„,,), 

O < q < P. For q > p^, recall that ^/nr]n,q{q) converges to the finite hmit 5^q'^ -f ipq 
as we have seen above, and it is elementary to verify that the expression in (jB.l[) 
converges to A^-^^ , {5^^ + ijjq, CquS^oo.q)- For q ^ Pt. and p* > O, we have seen that 
the limit of y/nrjn,p, (p*) is infinite, and it is easy to see that (jB.ip with p* replacing 
q converges to zero in this case. 

From the above considerations, it immediately follows that tt* ^(„) (p) con- 
verges to the limit in (|5.3p if p > p* , and to the limit in (|5.4p if p = p, . To show that 
TT* g(„) ^(„) (p) converges to zero in case p satisfies O < p < p*, it suffices to observe 
that here tt* g(„) ^(„) (p) is bounded by the expression in (jB.ip with p* replacing q. 
As we have seen above, Vn\rin,p,{p*)\ converges to infinity, such that this upper 
bound converges to zero as n — > oo. □ 

Proof of Provosition \5 . 1\ Again, it suffices to consider G* g(„) ^(„) (t) in view of 
Theorem 14.21 Recall that this cdf can be written as in p.4p . We first consider 
the individual terms G* ^(„) (i|p)7r* g(„, ^(„) (p) for p = O, . . . , P. In case p 
satisfies O < p < p*, note that tt* g(„) ^(„) (p) by Proposition 15.41 Hence, 
G* g(„) ^(„) (i|p)7r* g(„) ^(„) (p) converges to zero in total variation. 

In the remaining cases, i.e., for » satisfying p* < p < P, it is elementary to 
verify that Proposition 5.1 of Leeb [l| applies to G* g(„) ^(„)(t|p), where the quan- 
tity [3 in that paper equals Ad^'^'^ in our setting. In particular, that result gives 
the limit of the conditional cdf in the sense of weak convergence (because J^^-* is 
finite) . Consider first the case p > p* . From Proposition 15.41 we obtain the limit 
of TT* ^(„)(p). Combining the resulting limit expression with the limit expres- 
sion for G* g(„) ^(„)(t|p) as obtained by Proposition 5.1 of Leeb [l|, we see that 
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G* .(„) ^(„) {t\p)7r* („) („) (p) converges weakly to 



(B.2) 



z<t-AS 



(l - AaC^^{S^P'^ +ipp + boo,pZ, Cpcr^oo.p) ) *oo,p((i^:) 
(p) ^ ^ 

p 



p 

X 

q=p+l 



In case p — p* and > O, we again use Proposition 5.1 of Leeb [l[ and Propo- 
sition [5T4I to obtain that the weak limit of G* g(„) ^(„) (t|p*)7r* g(„) 
form (|B.2[) with replacing p. Since | is infinite, the integrand in (|B.2p reduces 
to one, i.e., the limit is given by 

p 

Finally, consider the case p = p* and = O. Arguing as above, we see that 
'^n,e("),a(")(^I^X,e("),a(-)('^) converges weakly to 

p 

9=0+1 

Because the individual model selection probabilities tt* g(„) (p), O < p < P, sum 
up to one, the same is true for their large-sample limits. In particular, note that ()5.1|1 
is a convex combination of cdfs, and that all the weights in the convex combination 
are positive. From this, we obtain that G* g(„) ^(„) {t) converges to the expression in 
(|5.ip at each continuity point t of the limit expression, i.e., G* g(„) ^(„) (i) converges 
weakly. (Note that a convex combination of cdfs on M.^ is continuous at a point t if 
each individual cdf is continuous at t; the converse is also true, provided that all the 
weights in the convex combination are positive.) To establish that weak convergence 
can be strengthened to convergence in total variation under the conditions given 
in Proposition l5.ll it suffices to note, under these conditions, that G* g(„) g.(„)(i|p), 
p* < p < P, converges not only weakly but also in total variation in view of 
Proposition 5.1 of Leeb [l[. □ 
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